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Abstract — We investigate distributed source coding of two 
correlated sources X and Y where messages are passed to a 
decoder in a cascade fashion. The encoder of X sends a message 
at rate Ri to the encoder of Y. The encoder of Y then sends 
a message to the decoder at rate R2 based both on Y and 
on the message it received about X. The decoder's task is to 
estimate a function of X and Y. For example, we consider the 
minimum mean squared-error distortion when encoding the sum 
of jointly Gaussian random variables under these constraints. We 
also characterize the rates needed to reconstruct a function of 
X and Y losslessly. 

Our general contribution toward understanding the limits 
of the cascade multiterminal source coding network is in the 
form of inner and outer bounds on the achievable rate region 
for satisfying a distortion constraint for an arbitrary distortion 
function d(x,y,z). The inner bound makes use of a balance 
between two encoding tactics — relaying the information about X 
and recompressing the information about X jointly with Y. In 
the Gaussian case, a threshold is discovered for identifying which 
of the two extreme strategies optimizes the inner bound. Relaying 
outperforms recompressing the sum at the relay for some rate 
pairs if the variance of X is greater than the variance of Y. 

I. Introduction 

Distributed data collection, such as aggregating measure- 
ments in a sensor network, has been investigated from many 
angles (TJ. Various algorithms exist for passing messages to 
neighbors in order to collect information or compute functions 
of data. Here we join in the investigation of the minimum 
descriptions needed to quantize and collect data in a network, 
and we do so by studying a particular small network. These 
results provide insight for optimal communication strategies 
in larger networks. 

In the network considered here, two sources of information 
are to be described by separate encoders and passed to a 
single decoder in a cascade fashion. That is, after receiving a 
message from the first encoder, the second encoder creates a 
final message that summarizes the information available about 
both sources and sends it to the decoder. We refer to this setup 
as the cascade multiterminal source coding network, shown 
in Figure Q] Discrete i.i.d. sources Xi € X and Yi £ y are 
jointly distributed according to the probability mass function 
Po(x,y). Encoder 1 summarizes a block of n symbols X n 
with a message / 6 {1, 2 nRl } and sends it to Encoder 
2. After receiving the message, Encoder 2 sends an index 
J 6 {1, 2 nR2 } to describe what it knows about both sources 
to the decoder, based on the message / and on the observations 
Y n . The decoder then uses the index J to construct a sequence 
Z n , where each Zi is an estimate of a desired function of Xi 
and Yi. 
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Fig. 1. Cascade Multiterminal Source Coding. The i.i.d. source sequences 
Xi . .... X n and Yi,...,Y n are jointly distributed according to po(x,y). 
Encoder 1 sends a message / about the sequence Xi,...,X n at rate Ri 
to Encoder 2. The second encoder then sends a message J about both 
source sequences at rate R2 to the decoder. We investigate the rates required 
to produce a sequence Z\ , . . . , Z n with various goals in mind, such as 
reconstructing estimates of X n or Y n or a function of the two. 



For example, consider the lossless case. Suppose we wish to 
compute a function of X and Y in the cascade multiterminal 
source coding network. What rates are needed to reliably 
calculate Z s ; = f(Xi, Yi) at the decoder? Computing functions 
of observations in a network has been considered in various 
other settings, such as the two-node back-and-forth setting 
of J2] and the multiple access channel setting in J3). In the 
cascade multiterminal network, the answer breaks down quite 
intuitively. For the message from Encoder 1 to Encoder 2, use 
Wyner-Ziv encoding J4) to communicate the function values. 
Then apply lossless compression to the function values at 
Encoder 2. Computing functions of data in a Wyner-Ziv setting 
was introduced by Yamamoto 0, and the optimal rate for 
lossless computation was shown by Orlitsky and Roche (H to 
be the conditional graph entropy on an appropriate graph. 

A particular function for which the optimal rates are easy to 
identify is the encoding of binary sums of binary symmeUic 
X and Y that are equal with probability p, as proposed by 
Korner and Marton [7|. For this computation, the required 
rates are R\ > h(p) and R2 > h(p), where h is the binary 
entropy function. Curiously, the same rates are required in the 
standard multiterminal source coding setting. 

Encoding of information sources at separate encoders has 
attracted a lot of attention in the information theory community 
over the years. The results of Slepian-Wolf encoding and com- 
munication through the Multiple Access Channel (MAC) are 
surprising and encouraging. Slepian and Wolf [8| showed that 
separate encoders can compress correlated sources losslessly 
at the same rate as a single encoder. Ahlswede [9] and Liao 
iflOll fully characterized the capacity region for the general 
memoryless MAC, making it the only multi-user memoryless 



channel setting that is solved in its full generality. Thus, the 
feasibility of describing two independent data sources without 
loss through a noisy channel with interference to a single 
decoder is solved. 

Beyond the two cases mentioned, slight variations to the 
scenario result in a multitude of open problems in distributed 
source coding. For example, the feasibility of describing two 
correlated data sources through a noisy MAC is not solved. 
Furthermore, allowing the source coding to be done with 
loss raises even more uncertainty. Berger and Tung [ 1 1 1 first 
considered the multiterminal source coding problem, where 
correlated sources are encoded separately with loss. Even 
when no noisy channel is involved, the optimal rate region 
is not known, but ongoing progress continues lfl2l [13|. 

The cascade multiterminal source coding setting is similar to 
multiterminal source coding considered by Berger and Tung in 
that two sources of information are encoded in a distributed 
fashion with loss. The difference is that communication be- 
tween the source encoders in this network replaces one of 
the direct channels to the decoder. Thus, joint encoding is 
enabled to a degree, but the down side is that any message 
from Encoder 1 to the Decoder must now cross two links. 

The general cascade multiterminal source coding problem 
includes many interesting variations. The decoder may need 
to estimate both X and Y, X only, Y only, or some other 
function of both, such as the sum of two jointly Gaussian 
random variables, considered in Section [V-AI Vasudevan, Tian, 
and Diggavi l[T4ll looked at a similar cascade communication 
system with a relay. In their setting, the decoder has side 
information, and the relay has access to a physically degraded 
version of it. Because of the degradation, the decoder knows 
everything it needs about the relay's side information, so the 
relay does not face the dilemma of mixing in some of the 
side information into its outgoing message. In the cascade 
multiterminal source coding setting of this paper, the decoder 
does not have side information. Thus, the relay is faced with 
coalescing the two pieces of information into a single message. 
Other research involving similar network settings can be found 
in |Q3 1, where Gu and Effros consider a more general network 
but with the restriction that the information Y is a function 
of the information X, and fl6l . where Bakshi et. al. identify 
the optimal rate region for lossless encoding of independent 
sources in a longer cascade (line) network. 

In this paper we present inner and outer bounds on the 
general rate-distortion region for the cascade multiterminal 
source coding problem. The inner bound addresses the chal- 
lenge of compressing a sequence that is itself the result of a 
lossy compression. Then we consider specific cases, such as 
encoding the sum of jointly Gaussian random variables, com- 
puting functions, and even coordinating actions. The bounds 
are tight for computing functions and achieving some types of 
coordinated actions. 

II. Problem Specifics 

The encoding of source symbols into messages is described 
in detail in the introduction and is depicted in Figure [T] 



A. Objective 

The goal is for X n , Y n , and Z n to satisfy an average letter- 
by-letter distortion constraint D with high probability. A finite 
distortion function d(x, y, z) specifies the penalty incurred 
for any triple (x, y, z). Therefore, the objective is to reliably 
produce a sequence Z n that satisfies 
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diX^Zi) < D. 



(1) 



Due to the flexibility in defining the distortion function 
d, the decoded sequence Z n can play a number of different 
roles. If the goal is to estimate both sources X and Y with 
a distortion constraint, then Z = (X, Y) encompasses both 
estimates, and d can be defined accordingly. Alternatively, the 
decoder may only need to estimate X, in which case Y acts 
as side information at a relay, and Z = X. In general, the 
decoder could produce estimates of any function of X and Y. 

B. Rate-Distortion Region 

The triple {R\, i?2, D) is an achievable rate-distortion triple 
for the distortion function d and source distribution po(x,y) 
if the following holds: 

For Ve > 0, 

3n e {1,2,...}, 



3i. 

3z n 
such that 



where 

Z" 



X n {l,...,2 nRl }, 

y n x {l,...,2 nfil } -> {!,..., 2 nR2 }, 



{1, 



-)ni?2 



(^f2d(X i ,Y i ,Z i )>Dj < e, 



= z n (j(Y n ,i(X n ))). 



The rate-distortion region 1Z for a particular source joint 
distribution p (x,y) and distortion function d is the closure 
of achievable rate-distortion triples, given as, 



K = C/{achievable (R 1 ,R 2 ,D) triples}. 
III. General Inner Bound 



(2) 



The cascade multiterminal source coding network presents 
an interesting dilemma. Encoder 2 has to summarize both 
the source sequence Y n and the message / that describes 
X n . Intuition from related information theory problems, like 
Wyner-Ziv coding, suggests that for efficient communication 
the message / from Encoder 1 to Encoder 2 will result 
in a phantom sequence of auxiliary random variables that 
are jointly typical with X n and Y n according to a selected 
joint distribution. The second encoder could jointly compress 
the source sequence Y n along with the auxiliary sequence, 
treating it as if it was also a random source sequence. But this 
is too crude. A lot is known about the auxiliary sequence, such 
as the codebook it came from, allowing it to be summarized 
more easily than this approach would allow. In some situations 



it proves more efficient to simply pass the description from 
Encoder 1 straight to the Decoder rather than to treat it as a 
random source and recompress at the second encoder. 

While still allowing the message / from Encoder 1 to be 
associated with a codebook of auxiliary sequences, we would 
like to take advantage of the sparsity of the codebook as we 
form a description at Encoder 2. One way to accommodate 
this is to split the message from Encoder 1 into two parts. 
One part is forwarded by Encoder 2, and the other part is 
decoded by Encoder 2 into a sequence of auxiliary variables 
and compressed with Y n as if it were a random source 
sequence. The forwarded message keeps its sparse codebook 
in tact, while the decoded and recompressed message enjoys 
the efficiency that comes with being bundled with Y . This 
results in an inner bound TZm for the rate-distortion region 
1Z. The definition of lZi n is found in OJ at the bottom of this 
page. The region lZi„ is already convex (for fixed po(x, y) and 
d), so there is no need to convexify using time-sharing. 

Theorem 3.1 (Inner bound): The rate-distortion region 1Z 
for the cascade multiterminal source coding network of Figure 
Q] contains the region lZi n . Every rate-distortion triple in TZi n 
is achievable. That is, 

K D K m . (5) 

Proof: For lack of space, we give only a description of 
the encoding and decoding strategies involved in the proof and 
skip the probability of error analysis. We use familiar tech- 
niques of randomized codebook construction, jointly typical 
encoding, and binning. 

For any rate-distortion triple in lZi n there is an associated 
joint distribution p(x,y, z,u,v) that satisfies the inequalities 
in (0. Construct three sets of codebooks, Cjj, Cy,u and Cz,u 
for i = 1, 2, \Cu\, where 

Cu = {u n (i)}T = \, 
Cv, = K(j,*)}™ 2 !, 

c z , = {^(M}r = Y 

Let mi = 2™( / ( x - c/ )+ e ), m 2 = 2 n ^ X ' V ^ +t \ and m 3 = 

2n(I(Y,V;Z\U)+e) 

Randomly generate the sequences u n (i) 6 Cu i.i.d. ac- 
cording to p(u), independent for each i. Then for each i 
and j, independently generate the sequences v n (j,i) £ Cy,i 



conditioned on u n (i) £ Cjj symbol-by-symbol according to 
p(v\u). Similarly, for each i and k, independently generate 
the sequences z n (k,i) £ Cz.i conditioned on u n (i) £ Cjj 
symbol-by-symbol according to p(z\u). 

Finally, assign bin numbers. For every sequence u n (i) £ Cjj 
assign a random bin b v {i) £ {1, 2™( I P f ; C7 l y )+ 2e )}. Also, 
for each i and each v n (j,i) £ Cy i assign a random bin 
b v (j,i) £ {l,...,2"(W™+ 2e )}.' 

Successful encoding and decoding is as follows. Encoder 
1 first finds a sequence u n (i) £ Cjj that is e-jointly typical 
with X n with respect to p(x.u). Then Encoder 1 finds a 
sequence v n (j,i) £ Cy.i that is e-jointly typical with the 
pair (X n , u n (i)) with respect to p(x, u, v). Finally, Encoder 1 
sends the bin numbers &[/(*) an d i) to Encoder 2. 

Encoder 2 considers all codewords in Cu with bin number 
bu(i) and finds that only u n (i) is e-jointly typical with 
Y n with respect to p(y,u). Then Encoder 2 considers all 
codewords in Cy,i with bin number bv(j,i) and finds that 
only v n (j,i) is e-jointly typical with the pair (Y n ,u n (i)) 
with respect to p(y,u,v). Finally, Encoder 2 finds a se- 
quence z n (k,i) £ Cz,i that is e-jointly typical with the triple 
(Y n , u n (i), v n (j, i)) with respect to p(y,u,v,z) and sends 
both i and k to the Decoder. 

The decoder produces Z n — z n (k,i). Due to the Markov 
Lemma ifTTI and the structure of p(x,y, z,u,v), the triple 
(X n ,Y n ,Z n ) will be e-jointly typical with high probability. 
Finally, e can be chosen small enough to satisfy the rate and 
distortion inequalities. ■ 

IV. General Outer Bound 

Theorem 4.1 (Outer bound): The rate-distortion region 1Z 
for the cascade multiterminal source coding network of Figure 
Q]is contained in the region lZ ou t defined in @. Rate-distortion 
triples outside of TZ out are not achievable. That is, 

n c n out . (6) 

Proof: Identify the message / from Encoder 1 along 
with the past and future variables in the sequence Y n as the 
auxiliary random variable U. ■ 



(Ri,R 2 ,D) 



3p(.T, y, z, u, v) = pa(x, y)p(u, v\x)p(z\y, u, v) such that 

D > E(d(X,Y,Z)), 

Rx>I(X;U,V\Y), 

R 2 > I(X-U)+I{Y,V-Z\U). 



(3) 



(R U R 2 ,D) 



3p(x, y, z, u) — po(x, y)p(u\x)p(z\y, u) such that 

D>E(d(X,Y,Z)), 

Ri > I(X;U\Y), 

R 2 >I(X,Y;Z). 



(4) 



V. Special Cases 

A. Sum of Jointly Gaussian 

Suppose we wish to encode two jointly Gaussian data 
sources at Encoder 1 and Encoder 2 in order to produce an 
estimate of the sum at the decoder with small mean squared- 
error distortion. Let X and Y be zero-mean jointly Gaussian 
random variables, where X has variance Px, Y has variance 

E( XY) 

Py, and their correlation coefficient is n = — '-. 

1 ) Inner bound: We can explore the region lZi n by optimiz- 
ing over jointly Gaussian random variables U, V, and Z to find 
achievable rate-distortion triples (Ri, R 2 , D). This restricted 
search might not find all extremal rate-distortion points in lZi n ; 
still it provides an inner bound on the rate-distortion region. Q 

The optimization of lZi n with the restriction of only consid- 
ering jointly Gaussian distributions p(x, y, z, u, v) leads to two 
contrasting strategies depending on the variances Px and Py 
of the sources and the rate R\. The two encoding strategies 
employed are to either forward the message from Encoder 1 to 
the Decoder, or to use the message to construct an estimate X n 
at Encoder 2 and then compress the vector sum X n + Y n and 
send it to the Decoder, but not both. In other words, either 
let V = (forward only) or let U = (recompress only). 
The determining factor for deciding which method to use is a 
comparison of the rate R\ with the quantity ^ log 2 

Case 1: (Recompress) 

Rl ~ l^Py-- 

If the rate R\ is large enough, then the optimal encoding 
method is to recompress at Encoder 2. This will allow for 
a more efficient encoding of the sum in the second message 
J rather than encoding two components of the estimate 
separately. 

The distortion in this case is 



D 



(1-P 2 )(1 



-211-2 



-2R, 



P: 



x 



2- 2R ip x+Y , 



(7) 



where Px+y is the variance of the sum X + Y. 
Case 2: (Forward) 



Ri 



< 



1, Px 
2 l0g2 /V 



If the variance of X is larger than Y and the rate Ri is 
small, then the optimal encoding method is to forward the 
message I from Encoder 1 to the Decoder without changing 
it. By rearranging the inequality, we see that 2~ 2Rl Px > P> 



Y- 



'To perform this optimization, first note that the marginal distribution 
p(x, y, u) determines the quantities I(X; U) and I(X\ U\Y), and p(x, y, u) 
only has one significant free parameter due to Markovity. All remaining 
quantities that define the region TZi„ are conditioned on U, including the final 
estimate at the decoder since U is available to the decoder. Therefore, after 
fixing p(x, y, u) we can remove U entirely from the optimization problem 
by exploiting the idiosyncracies of the jointly Gaussian distribution. Namely, 
reduce the rates Ri and R2 appropriately and solve the problem without U 
with X replacing X and Y replacing Y, where X is the error in estimating 
X with U, and Y is the error in estimating Y with U. This greatly reduces 
the dimensionality of the problem. 



From rate-distortion theory we know that 2~ 2Rl Px is the 
mean squared-error that results from compressing X at rate 
R±. The fact that the variance of the error introduced by the 
compression at Encoder 1 is larger than the variance of Y 
subtly indicates that the description of X was more efficiently 
compressed by Encoder 1 than it would be if mixed with Y 
and recompressed. 

The estimate of X from Encoder 1, represented by U, which 
is forwarded by Encoder 2, might be limited by either Ri 
or Ri. In the case that R% is completely saturated with the 
description of U at rate I(X; U), there is no use trying to use 
any excess rate R\ — I(X; U\Y) from Encoder 1 to Encoder 
2 because it will have no way of reaching the decoder. On 
the other hand, in the case that R\ is the limiting factor for 
the description of U at rate I(X; U\Y), then the excess rate 
i?2 — I(X; U) can be used to describe Y to the decoder. We 
state the distortion separately for each of these cases. 



If R 2 < \ log 2 ( 2 7zf ) then, 



1-p 2 



_ -2fl 2 



D = 2 



(Px + y + (1-P 2 )(2^--1)P y ). 
If R 2 > |log 2 (2f^i) then, 

D = ((l-p 2 )2- 2R i - (I- p 2 2- 2R i)2- 2R *)P x 
+ 2- 2R2 (P x +Y + (2 2i?1 - 1) P Y ) ■ 

Again, Px+y is the variance of the sum X + Y. 

2) Outer bound: The outer bound lZ ou t is optimized with 
Gaussian auxiliary random variables. However, for simplicity, 
we optimize an even looser bound by minimizing R\ and R2 
separately (cut-set bound) for a given distortion constraint. The 
result is the following lower bound on distortion. 

D > max{2- 2fll (l - p 2 )P x , 2- 2R -P x+ y}- (8) 

3) Sum-Rate: Consider the sum-rate Ri + R 2 required to 
achieve a given distortion level D. We can compare the sum- 
rate-distortion function R(D) for the inner and outer bounds. 

Let Px < Py- This puts us in the recompress regime of 
the inner bound. By optimizing (0 subject to R\ + R 2 = R, 
we find that the optimal values Rl and R2 satisfy 



r* 9 -r; = - log, , 

2 \(l-p 2 )P x 



as long as R is greater than the right-hand side. Notice that 
R 2 is more useful than Ri, as we might expect. From this 
we find a piece-wise upper bound on the sum-rate-distortion 
function. Similarly we find a piece-wise lower bound based 
on ©. 

Sum-rate upper bound. Low distortion region: 

(l~P 2 )Px 



D < (l-p z )P x [2 



then 



Px+Y 



1 ( {l- P 2 )Px 
2 l0g2 D 




High distortion region: (up to D < Px+y) 

R{D) < ilog 2 f P ^7 (1 -f )P " 
Sum-rate lower bound. Low distortion region: 



£> < (i-p 2 )Px, 



then 



-log 2 (V-^ 
2 62 I D 



High distortion region: (up to D < Px+y) 

'Px+Y 



R(D) 



> 



\ l0g2 



D 



Lemma 5.1: The gap between the upper and lower bounds 
on the optimal sum-rate (derived from lZi n and lZ ou t) needed 
to encode the sum of jointly Gaussian sources in the cas- 
cade multiterminal network with a squared-error distortion 
constraint D is no more than 1 bit, shrinking as D increases, 
for any jointly Gaussian sources satisfying Px < Py. 

B. Computing a Function 

Instead of estimating a function of X and Y, we might 
want to compute a function exactly. Here we show that the 
bounds 1Zi n and TZ ou t are tight for this lossless case@ To do 
so, we consider an arbitrary point (R\, R2,D) £ Tlout and its 
associated distribution p(x,y,z,u). For the inner bound lZi n 
we use the same distribution p; however, let U = and V take 
the role of U from the outer bound. Notice that the Markovity 
constraints are satisfied. Now consider, 

I(Y,V;Z) = H(Z)-H(Z\Y,V) 

= H(Z)-H(Z\Y,V,X) 

= H(Z) 

= I(X,Y;Z), 

due to the Markovity constraint X— (Y, V)— Z and the fact that 
Z is a function of X and Y. Therefore, for this distribution 
p, all of the inequalities in TZi n are satisfied for the point 

(R U R 2 ,D). 

The outer bound 7Z ou t makes it clear that optimal encoding 
is achieved by using Wyner-Ziv encoding from Encoder 1 
to compute the value of the function Z at Encoder 2. This 
optimization is carefully investigated in O and equated to a 
graph entropy problem. Then Encoder 2 compresses Z to the 
entropy limit. 

C. Markov Coordination 

It is possible to talk about achieving a joint distribution 
of coordinated actions p(x,y,z) = po(x,y)p(z\x,y) without 
referring to a distortion function, as in iTTSll . Under some 
conditions of the joint distribution, the bounds 1Z. m and lZ ou t 
are tight. One obvious condition is when X, Y, and Z form 

2 The optimal rate region for computing functions of data in the standard 
multiterminal source coding network is currently an open problem [17|. 



the Markov chain X — Y — Z. In this case, there is no need to 
send a message I from Encoder 1, and the only requirement 
for achievability is that R 2 > I(Y; Z). 

Another class of joint distributions po(x, y)p{z\x, y) for 
which the rate bounds are provably tight is all distributions 
forming the Markov chain Y — X — Z. This encompasses the 
case where Y is a function of X, as in 1151 . To prove that the 
bounds are tight, choose U = Z and V = for Tli n . We find 
that rate pairs satisfying 



Til > I{X;Z\Y), 
R 2 > I(X;Z), 



(9) 
(10) 



are achievable. And all rate pairs in 1Z out satisfy these inequal- 
ities. 
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